Bioinformatics (Thomas Dandekar, Meik Kunz)

already correctly reflect the different activities of the metabolic pathways (which is only

true on statistical average or for sufficiently large networks).

Finally, even the semiquantitative models for signal modeling use heuristics, in particu

lar the kinetics is estimated only from the Boolean networks of the process to be modeled.

This allows me to get started with such a model when little is known in detail about the

speed and nature of the proteins, enzymes, kinases, etc. involved.

How can you now program a heuristic search yourself?

The BioPerl and Biojava modules (https://bioperl.org/, https://biojava.org/) at the EBI

(European Bioinformatic Institute) are a good way to quickly program a heuristic search

or even a simple program or a larger program composed of simple parts. They provide

ready-written modules (program parts) for reading, output, but also for web servers or

database searches for the user. The PERL Cookbook (Christiansen and Torkington 2003)

offers a lot of tips for concrete implementation with the PERL programming language.

Even more tips are found in further publications (Angly et al., 2014; Vos et al., 2011;

Stajich et al., 2002; Tisdal et al., 2001).

For calculations, the book “numerical recipies” (https://numerical.recipes) is a real trea

sure trove. Originally a book (Press et al., 2007), it now explains online in a clear way how I

can quickly and easily compute small calculations or even surprisingly complex ones, which,

however, come up again and again in many problems. Similar to a cooking recipe, the prin

ciples are explained and codes are provided, for example to make a Matlab code run faster

(tutorial: https://numerical.recipes/nr3_matlab.html) or to use a “C+ +” code for even faster

calculations instead. Examples of applications for these numerical recipes, also in bioinfor

matics, are e.g. efficient matrix and vector calculations (calculate metabolic fluxes effi

ciently), but also routines for geometric tasks (calculate protein structures) or the generation

of random numbers (for population simulations in ecology).

Conclusion

In this chapter we have tried to look a little behind the façade of the fast bioinformat

ics programs on the net, such as the BLAST server at the NCBI (National Center of

Biotechnology Information) in Washington. In most cases, you can get an answer in

seconds to a few minutes. This is made possible by fast but not entirely accurate

searches (heuristics), and we have seen some tricks for doing this. For example, in

BLAST, the heuristic is to first find two short but perfect match alignments in the same

database entry before I check over the whole sequence length to see what the similarity

is to the question sequence.

It is equally important to make the database (e.g. GenBank, UniProt) quickly read

able, for example by indexing it (after all, you look up this book much more quickly via

the table of contents than by leafing through it). In addition to speed, sensitivity (do I

recognise all relevant entries?) and specificity (do I not get too many non-relevant

entries?) are also important for a good search.

6 Extremely Fast Sequence Comparisons Identify All the Molecules That Are Present…